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ABSTRACT 

BotDB  is  designed  to  encapsulate  the  rapidly  expanding  amount  of  information 
about  the  structure  and  function  of  the  botulinum  (BoNT)  and  tetanus  (TeNT) 
neurotoxins  and  to  track  a  variety  of  basic  and  applied  research  efforts.  The  AceDB 
management  system  was  chosen  for  this  project  because  of  its  flexibility  in  manipulating 
semi-structured  data  sets  and  for  its  information  retrieval  query  languages.  Besides 
storing  amino  and  nucleic  acid  sequences  of  the  clostridial  neurotoxin  genes  and  proteins, 
BotDB  provides  sequence  data  for  new  classes  of  objects  including  neurotoxin  mutants, 
substrates  and  their  mutants,  associated  non-toxic  proteins,  and  C-fragment  vaccine 
candidates.  New  data  types  provide  information  on  detection  assays  for  the  neurotoxins, 
and  on  structural  data  from  X-ray  crystallographic  and  circular  dichroism  spectroscopic 
studies.  Kinetic  parameters  from  biochemical  experiments  include  reaction  rates  for 
substrate  cleavage,  and  block  of  neurotransmission.  The  structures  and  kinetic 
characteristics  of  presently  known  chemical  inhibitors  are  also  being  archived.  All  of 
these  data  are  associated  with  citations  of  the  relevant  literature  for  on-line  annotation. 
Graphics  viewer  programs  are  provided  to  display  stored  images  and  three-dimensional 
representations  of  protein  structures.  BotDB  is  in  the  alpha-test  phase  of  development 
and  will  become  a  publicly  available  web  site. 

INTRODUCTION 

The  neurotoxins  from  Clostridium  botulinum  and  related  species  represent  some 
of  the  most  lethal  substances  known  and  are  the  subject  of  general  and  specialized 
reviews  1'^.  Only  those  structural  and  functional  features  of  these  proteins  that  pertain  to 
the  description  of  BotDB  will  be  mentioned  here. 

The  seven  immunologically  distinguishable  serotypes  (BoNT/A-G)  cause  flaccid 
paralysis  in  humans  and  experimental  animals  by  preventing  acetylcholine-containing 
vesicles  from  releasing  their  contents  in  a  calcium-mediated  response  to  chemical  or 
electrical  stimuli  at  peripheral  cholinergic  presynaptic  endings.  The  structurally  related 
tetanus  neurotoxin  (TeNT)  from  C.  tetani ,  in  contrast,  causes  spastic  paralysis  due  to  its 
net  disinhibitory  effect  within  the  central  nervous  system. 


1 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1 .  REPORT  DATE  2.  REPORT  TYPE 

01  JUL  2003  N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

BOTDB:  A  Database  For  The  Clostridial  Neurotoxins 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

United  States  Army  Medical  Research  Institute  of  Infectious  Diseases 
Division  of  Toxinology  and  Aerobiology;  Department  of  Cell  Biology  and 
Biochemistry,  Fort  Detrick,  Maryland  USA  21702-5011 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

See  also  ADM001523.,  The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION  OF 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE 

unclassified  unclassified  unclassified 

li 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Within  this  family,  each  neurotoxin  is  expressed  as  a  single  polypeptide  chain 
having  a  molecular  mass  (Mr)  of -150  kDa.  For  the  toxicity  to  occur,  the  polypeptide 
must  be  enzymatically  ‘nicked’  to  form  a  heterodimer  of  the  light  (L)  chain  (Mr  -50  kDa) 
and  of  the  heavy  (FI)  chain  (Mr  -100  kDa)  that  are  covalently  linked  by  a  disulfide 
bridge.  The  reduction  of  this  inter-chain  bridge  is  also  required  for  toxicity.  The  three- 
dimensional  (3-D)  crystal  structures  of  some  these  neurotoxins  and  their  fragments  have 
been  recently  solved  including  the  BoNT/A^  and  the  BoNT/B^  holotoxins. 

The  L-chains  are  zinc-dependent  proteases  that  cleave  soluble  N-ethylmaleimide 
sensitive  factor  (NSF)  attachment  protein  receptors  (SNAREs).  BoNT/A,  /Cl, and  /E 
cleave  the  synaptosome-associated  protein  of  25  kDa  (SNAP-25).  BoNT/Cl  is  the  only 
serotype  known  to  cleave  both  syntaxin-la  and  SNAP-259’10.  The  remaining  serotypes 

and  TeNT  cleave  synaptobrevin  (or  vesicle-associated  membrane  protein,  VAMP^. 
Isoforms  of  synaptosomal-associated  proteins  that  occur  in  non-neural  tissue  (e.g., 
cellulobrevin  and  SNAP-23)  are  also  susceptible  to  cleavage  if  exposed  to  the  appropriate 
neurotoxin  serotype. 

The  Fl-chain  is  associated  with  a  high-affinity  binding  of  its  C-terminal  domain 
to  as  yet  unidentified  ectoacceptors  on  cholinergic  nerve  terminals  1  * .  After  receptor- 
mediated  endocytosis,  the  translocation  of  the  toxic  moiety  into  the  neuroplasm  is  caused 
by  the  N-terminal  domain  of  the  Fl-chain.  Cation-selective  ion  channels  are  formed  by 
from  these  domains  that  are  believed  to  allow  the  escape  of  the  toxic  moiety  from  low- 
pH,  endocytotic  compartments  into  the  neuroplasm. 

METHODS 

SOURCES  OF  DATA 

A  critical  amount  of  functional  and  structural  information  about  the  BoNTs  and 
TeNT  exits.  Presently,  23  precursor  sequences  from  various  strains  of  the  seven  BoNT 

serotypes  and  TeNT  have  been  deposited  in  public  databases  (SwissProt^,  GenBank^). 
At  this  time,  four  sets  of  crystal  structure  coordinates  for  the  holotoxins,  three  sets  for  the 
L-  chain  and  eight  sets  for  H-  chain  C-fragments  (with  and  without  ligands)  are  in  the 
Protein  DataBank  *  4  (PDB).  At  least  three  complete  neurotoxin  progenitor  genes  are 
available  from  Gen  Bank  *  3  which  contain  four  to  six  non- toxic  genes  for  proteins  that  are 
associated  with  each  of  the  BoNTs-’.  The  number  of  protein  substrates  and  cleavable 
peptides  that  have  been  examined  in  the  literature  is  rapidly  increasing^,  16  and  the  list 
of  neurotoxin  active-site  inhibitors  is  growing^, 18 

DESCRIPTION  OF  BOTDB  FEATURES 

BotDB  was  patterned  after  aCHEdb,  a  specialized  archive  of  protein  structural 
data  for  the  cx/(3  hydrolase  fold  family  that  are  structurally  similar  in  3-D  space,  yet  have 
a  variety  of  sequences  and  a  diverse  set  of  catalytic  and  non-catalytic  functions  19,20 
Both  of  these  databases  are  controlled  by  AceDB,  an  object-oriented- like  database 
management  system  originally  written  by  Richard  Durbin  and  Jean  Thierry-Mieg  for  the 
genomic  data  of  the  nematode  C.  elegans >23.  in  contrast  to  relational  database  systems 
where  data  are  stored  in  tables^,  AceDB  has  classes  of  objects  that  store  the  data. 

Objects  are  defined  by  models  that,  taken  together,  comprise  the  schema  for  the  database. 
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A  key  advantage  that  AceDB  offers  over  relational  tables  is  that  it  is  more 
flexible.  Specifically,  AceDB  allows  incomplete  sets  of  diverse  types  of  data,  a  feature 
that  saves  disk  space  and  does  not  degrade  the  speed  or  efficiency  of  the  program.  Also, 
the  models  of  AceDB  can  be  readily  changed  or  expanded  without  rebuilding  the 
database,  a  characteristic  that  is  especially  important  in  the  early,  dynamic  stages  of 
development  when  the  data  structures  are  not  yet  fully  defined.  Moreover,  AceDB  is 
open-source  software  in  contrast  to  commercial-grade  relational  systems  that  can  be 
prohibitively  expensive. 

BotDB  has  thus  far  been  tested  with  the  Microsoft  WINDOWS  95,  98,  NT  and 
2000  operating  systems  using  'ACEDB  for  Windows'^  and  the  aCHEdb  schema  file  as 
an  initial  template  for  the  classes  and  objects.  The  main  window  of  BotDB  is  illustrated 
in  Figure  1  and  shows  a  partial  list  of  the  25  data  classes  that  are  presently  available. 
Medline  citations  available  through  PubMed  searches  (www.ncbi.nlm.nih.gov/entrez/ 
query.fcgi)  are  linked  to  each  data  class  and  for  each  amino  or  nucleic  acid  sequence. 
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Figure  1.  The  main  widow  of  BotDB.  Top:  boxes  can  be  clicked  on  for  database 
searching  and  general  administrative  operations.  Bottom:  partial  list  of  Classes  that  the 
user  can  choose  for  more  detailed  information. 


TYPES  OF  DATA  WITHIN  BOTDB 

Comprising  one  portion  of  this  database  are  the  nucleic  and  amino  acid 
sequences  of  the  neurotoxins  and  their  corresponding  SNARE  substrates.  In  Figure  2,  the 

location  of  a  hydrophobic,  potential  channel-forming  region^, 26-28,  either  in  absolute 
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terms  of  the  highlighted  amino  acid  sequence  display,  or  by  its  relative  position  along  the 
vertical  bar,  represents  just  one  protein  feature  that  can  be  viewed  in  this  type  of  graphic 
display. 


Figure  2.  Amino  acid  sequence  display. 

This  search  result  shows  a  portion  of  the  1295  residue  sequence  of  BoNT/A  with 
a  vertically  displayed  hydropathicity  plot  and  a  23-mer  region  corresponding  to  a 
channel-forming  peptide. 


Figure  3.  Nucleic  acid  sequence  display. 
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The  sequences  are  obtained  from  publicly  accessible  databases:  NCBI’s  Genbank 
and  ExPASy’s  SwissProt.  Gene  and  protein  fragments  and  the  C-fragment  vaccine 

candidates-'’*^  1  9  are  included  along  with  sequences  of  the  nontoxic  proteins  that  form 
large  complexes  with  the  BoNTs.  A  limited  set  of  restriction  enzymes  that  target  sites  on 
DNA  are  also  featured  (Figure  3). 

This  output  shows  the  partial  sequence  of  the  progenitor  gene  of  BoNT/B  with 
the  toxin  and  the  five  non-toxin  genes  (far  left).  The  short  horizontal  lines  on  the  vertical 
yellow  bar  represent  cleavage  sites  for  the  indicated  restriction  enzymes  (Ava  I,  Hind  III, 
AsuII,  and  ApaLI).  The  vertical  boxes  on  either  side  of  the  yellow  bar  indicated  that  the 
genes  for  ‘bont/b-2’,”ntnh’,  and  ‘p-21’  are  coded  on  the  opposite  strand  with  respect  to 
the  other  three  genes.  The  highlighted  portion  of  the  DNA  sequence  at  the  bottom 
represents  a  portion  of  the  ‘bont/b-2’  gene.  The  ‘ntnh’  sequence  “aagett”  highlighted  in 
red  corresponds  to  the  motif  recognized  by  the  Hindlll  restriction  enzyme. 

As  in  the  aCHEdb,  inhibitors  of  enzymatic  activity  of  the  neurotoxins  (e.g., 
Figure  4)  will  also  constitute  a  continually  expanding  segment  of  BotDB.  Data  from 
coordinate  files  for  3-D  structures  will  also  be  included  for  display  (Figure  5). 
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Figure  4.  Candidate  inhibitor  of  BoNT  activity. 


The  arrows  between  the  overlapping  windows  indicate  the  user’s  order  of 
selection.  The  cascade  of  windows  starts  in  the  back.  The  desired  class,  INHIBITOR  in 
this  case,  is  highlighted  in  the  main  BotDB  window,  followed  by  TPEN  in  the  MAIN 
KEYLIST  window.  Selecting  the  ‘pickmetocall’  label  opens  the  window  containing 
the  chemical  representation  of  TPEN  within  the  GifViewer. 
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Figure  5.  3-D  representation  of  BoNT/A. 


In  this  example,  the  cascade  of  windows  starts  in  the  back  while  the  red  arrows 
indicate  the  item  selected  by  the  user.  Following  the  red  arrows  within  this  cascade 
illustrates  how  the  user  can  highlight  the  class  Botdb  in  the  main  BotDB  window,  and 
highlight  BoNT/A  in  the  MAIN  KEYSET  window  to  obtain  the  Structure  “3BTA”. 
Selecting  the  ‘pick  me  to  call’  label  brings  up  the  final  window  that  has  the  molecular 
representation  of  the  PDB  file  3BTA  as  visualized  using  RASMOL. 

Unlike  aCHEdb,  lethality  and  detection  assay  data  for  the  BoNTs  comprise  a  new 
class  of  data  objects  for  BotDB  (Figure  6).  Each  bioassay  is  characterized  by  its 
serospecificity,  sensitivity,  the  measures  used  (e.g.,  optical  density;  agglutination), 
matrices  (e.g.,  buffers),  animal  species,  and  a  key  reference.  An  example  output  from  this 
class  of  data  is  illustrated  in  the  description  of  the  query  features  below. 

Beyond  merely  archiving  PDB  coordinates  of  atomic  positions  from  X-ray 
crystallographic  data,  BotDB  includes  a  new  class  of  data  objects  for  protein  secondary 
structures.  BotDB  can  store  secondary  structural  information  in  several  ways.  BotDB 
can  store  images  that  display  the  locations  of  secondary  structural  elements  (helices  and 
strands)  with  respect  to  a  given  amino  acid  sequence.  These  "maps"  are  created  from  the 

3-D  structural  data  at  the  PDBsum  website  from  where  they  can  be  downloaded^  1 . 
Quantitative  measurements  are  also  included  for  secondary  structure  content  (%  helix, 
strand,  coil)  in  aqueous  solution  of  various  peptides  and  proteins  from  circular  dichroic 
studies  that  are  available  from  the  literature^, 33  These  results  may  be  compared  to 
those  from  the  available  crystallographic  data.  Finally,  graphic  displays  of  secondary 
structure  predictions  from  outputs  of  artificial  neural  networks  can  be  included^, 35 
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Figure  6.  Querying  the  database  using  the  Table  Maker  utility. 


The  user  initially  selects  the  ‘Query’  button  in  the  main  window  to  see  the  query 
menu.  After  selecting  Table  Maker,  the  user  fills  in  the  appropriate  blanks  in  the  form  to 
make  a  list  of  BoNT  assays.  Selecting  the  ‘Search  Whole  Class’  button  generates  the 
search  result.  In  this  output  there  are  fourteen  BotDB  entries  and  five  types  of  assays  for 
BoNT  serotypes  A-E.  The  whole-animal  bioassay  selected  is  given  with  the  route  of 
neurotoxin  administration  used  and  the  observed  lower-limit  of  sensitivity  in  mouse  LD50 
units. 


A  portion  of  BotDB  will  also  be  devoted  to  the  kinetics  of  neurotoxicity,  e.g.,  the 
onset  of  paralysis  and  persistence  of  symptoms.  The  kinetic  parameters  that  are  presently 

included  are  those  calculated  from  enzymatic  assays^  (e.g.,  KM,  kcat),  the  effect  of 
inhibitors^  (e.g.,  Kj),  their  binding  affinity  to  cctoacccptors'  *  (Kd),  and  the  time  course 
of  motor  paralysis^  (e.g.,  time-to-50%  block). 

DATA  VISUALIZATION 

Two  visualization  programs  are  included  as  auxiliary  software  that  accompanies 
BotDB.  GifViewer  is  a  freeware  utility  to  view  static  GIF-formatted  files  on  WINDOWS 
systems  (DevelCor,  www.develcor.com).  An  example  is  shown  in  Figure  4  of  a  two 

dimensional  chemical  representation  of  a  small  organic  chelator  molecule  (TPEN)-^. 

The  other  program,  RasMol  (www.umass.edu/microbio/rasmol/index2.htm),  is  a  freeware 
package  to  visualize  3-D  molecular  representations  using  stored  PDB-formatted  files 
(Figure  5).  This  utility  enables  the  user  to  move  and  rotate  images  of  proteins  and  other 
molecules  in  a  variety  of  display  formats  at  the  atomic  level. 
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SEARCH  ENGINES 


AceDB  contains  two  search  engines,  the  original  AceDB  query  language  and  the 
newer  AQL,  both  of  which  allow  sophisticated  queries  to  be  made.  Results  from  the 
queries  are  displayed  as  relational  tables  that  can  be  exported  as  text  files  to  other 
programs,  e.g.,  spreadsheets,  for  further  computational  analyses.  An  example  is  shown  in 
Figure  6  in  which  the  Table  Maker  utility  builds  a  storable  search  query  for  a  BoNT 
assay.  An  expanded  version  of  this  example  is  included  in  one  of  the  tutorials  that 
accompany  this  database  package. 


CONCLUSIONS 

This  database  is  designed  for  use  in  a  variety  of  tasks.  One  of  these  will  be  to 
serve  as  a  local  repository  of  parameters  derived  from  enzymatic  analyses,  inhibitor 
screenings,  and  toxicokinetic  studies.  Parameters  will  also  include  those  from  studies 
focused  on  the  formation  of  toxin-induced  ion  channels  and  on  SNARE  complex 
formation^  9. 

Other  features  of  this  database  could  be  used  to  address  a  number  of  questions 
that  remain  in  this  research  area.  For  example,  the  difficulty  in  separating  the  L-  and  H- 
chains  emphasizes  the  importance  to  understand  these,  as  yet  uncharacterized,  non- 
covalent  interactions,  so  that  the  structural  identity  of  the  toxic  moiety  can  be  explained. 
A  related  open  question  is  how  the  cationic  channels  formed  by  the  translocation 
domain^O  are  associated  with  the  escape  of  the  toxic  fragment  into  the  neuroplasnA 
Areas  of  further  content  development  will  include  ganglioside  structures  and  their  role  in 
neurotoxin  binding  at  cholinergic  nerve  terminals^  1 .  Despite  our  long-held  knowledge 

of  the  multi-step  intoxication  process  1,  it  is  evident  from  the  above  remarks  that  much 
work  remains  to  clarify  further  the  functional  roles  played  by  the  molecular  machinery  of 
these  neurotoxins. 

Future  database  improvements  will  include  the  port  of  BotDB  to  a  UNIX-based 
server  so  that  a  web  accessible  version  of  this  database  can  be  made  publicly  available. 
Similar  uses  of  auxiliary  3-D  viewers  exist  at  the  SCORPION  website^  and  the  RDB 
receptor  database^.  Future  releases  of  BotDB  will  provide  additional  choices  for  these 
viewers  as  they  become  available.  It  is  also  envisioned  that  specialized  structure- 
function  databases  such  as  BotDB  will  be  integrated  into  networks  of  databases,  despite 
their  diverse  schema  and  data  formats,  by  converting  their  data  files  into  a  universally 

compatible  format  such  as  XML  or  its  successor^. 
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